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ABSTRACT 

The  potential  constraints  on  a  ring  structured  distributed 
computing  system  imposed  by  the  shipboard  environment  were 
discussed.   The  feasibility  of  increasing  distributed  ring 
system  availability  to  meet  the  requirements  were  investi- 
gated.  It  was  shown  that  with  a  multiply  linked  ring  struc- 
ture, shipboard  environmental  effects  would  not  severely 
degrade  successful  operation  of  a  distributed  system.   This 
finding  could  result  in  the  utilization  of  distributed  ring 
computing  systems  with  suitably  redundant  data  path  schemes 
as  a  highly  reliable  general  purpose  data  processing  system 
on  shipboard  platforms. 
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I.   INTRODUCTION 

The  purpose  of  this  thesis  was  to  investigate  the  feasi- 
bility of  instituting  a  high  degree  of  system  availability 
in  ring  structured  distributed  computing  systems  in  order 
to  make  them  compatible  with  shipboard  environments.  A 
significant  amount  of  effort  was  given  to  introducing  re- 
dundant data  links  and  highly  redundant  ring  hardware  in 
achieving  this  higher  state  of  availability.   Prior  to  the 
beginning  of  this  study,  the  distributed  ring  computing 
system  concepts  were  studied  to  ensure  that  the  problems 
unique  to  this  system  study  were  well  understood. 

Although  the  concept  of  distributing  ring  computing 
systems  is  relatively  recent,  significant  research  has  been 
conducted  both  in  the  civilian  academic  environment  and  at 
the  Naval  Postgraduate  School.   Chapter  II  examines  this 
background  information  and  gives  an  overview  and  functional 
description  of  distributed  ring  system  software  and  hardware 
concepts. 

In  the  earlier  sections  of  Chapter  III,  the  specific 
rationale  for  investigating  methods  of  increased  system 
availability  and  reliability  in  a  shipboard  environment 
are  developed.   In  the  central  sections  of  Chapter  III, 
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specific  designs  of  data  path  redundancy  as  possible  ways 
of  increasing  system  availability  in  distributed  ring  systems 
are  discussed  and  resultant  system  availability  calcula- 
tions are  derived.  A  significant  amount  of  difficulty  was 
encountered  in  developing  the  complicated  mathematical 
models  for  one  system  design  in  particular.  While  several 
approaches  to  system  availability  calculations  were  attempted, 
only  one  proved  to  produce  reasonable  results.   A  detailed 
discussion  of  the  system  calculations  developed  can  be  found 
in  the  latter  sections  of  Chapter  III  as  well  as  a  discussion 
of  the  results  of  data  path  redundant  designs  as  they  impact 
on  overall  system  availability  improvement. 

Chapter  IV  discusses  a  potential  redundant  hardware 
design  of  a  data  ring  interface  utilizing  current  micro- 
computer technology.   Finally,  Chapter  V  summarizes  the 
work  completed  and  states  the  conclusions  of  this  thesis. 
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II.   BACKGROUND 

Given  the  notable  advancements  in  computer  system  tech- 
nology, coupled  with  the  decreasing  costs  of  hardware,  the 
concept  of  establishing  Distributed  Ring  Computing  Systems 
(DRCS)  using  present  micro  and  mini-computer  hardware  has 
become  increasingly  popular.   The  main  features  of  this 
type  of  system  which  make  it  so  desirable  over  other  possible 
system  configurations,  are  the  greater  flexibility  of  the 
system,  its  reduced  costs,  and  its  processor-independent 
communications  protocols. 

In  view  of  the  above  mentioned  advantages,  applications 
of  the  ring  structured  computing  systems  to  naval  shipboard 
designs  appear  highly  desirable.   Historically,  the  par- 
ticular information  transfer  needs  of  electronic  subsystems 
on  shipboard  platforms  have  been  met  by  installing  dedicated 
cabling  throughout  the  hull.  With  the  ring  structured 
communication  network,  given  suitable  redundancy  in  network 
paths,  the  situation  should  never  again  arise  where  ship 
subsystems  will  be  functionally  dependent  upon  a  single 
thin  thread  of  dedicated  wiring.   The  distributed  ring 
computing  system  with  path  redundancy  will  provide  multiple, 
physically  separated  paths  throughout  the  ship  which  will 
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increase  information  transfer  capacity  and  operational 
survivability. 

Considerable  research  into  configuration  and  uses  of 
the  ring  structured  distributed  computing  systems  has  been 
conducted  by  David  Farber  /Ref .  27  of  the  University  of 
California  at  Irvine  as  well  as  by  students  and  faculty 
at  the  Naval  Postgraduate  School  /Ref s .  4,  5,  7,  9,  147. 
In  reference  11  Professor  Farber  discusses  the  increased 
reliability  achieved  in  a  ring  structured  system  by  dis- 
tributing control  of  the  ring  among  active  users  with  a 
"Fail  Soft"  philosophy  of  control.   The  concept  of  "Fail 
Soft"  systems   is  one  where  systems  exhibit  the  property 
of  controlled  system  degradation  rather  than  total  failure 
as  a  result  of  component  failure.  At  the  Naval  Postgraduate 
School  (NPS)  a  ring  data  communication  system  utilizing 
microprocessor  technology  has  been  designed  and  tested. 
There  is  no  intention  at  this  time  to  reiterate  the  argu- 
ments regarding  the  overall  advantages  of  the  ring  struc- 
tured computer  systems  over  other  possible  configurations. 
These  arguments  are  more  than  amply  covered  in  the  refer- 
ences. What  is  intended  is  to  investigate  and  discuss  the 
system  reliability  advantages  of  a  suitably  redundant 
version  of  a  ring  structured  computer  information  system 
and  to  analyze  its  advantages  in  higher  reliability. 
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A.   BASIC  CONCEPTS  OF  A  RING  STRUCTURED  DISTRIBUTED 
COMPUTING  SYSTEM 

A  distributed  computing  system  is  process  oriented. 
That  is,  most  system  services  are  processes.  Within  each 
processor  connected  to  the  distributed  computing  system  is 
a  resident  software  nucleus  which  provides  local  resource 
management  and  interprocess  communication  services. 

The  key  element  to  the  distributed  computing  system  is 
the  method  of  communication  between  processors.   To  gain 
maximum  flexibility  the  system  should: 

1.  Distribute  control  of  the  system  (no  one  processor 
is  in  control  of  the  system  at  all  times) . 

2.  Execute  processes  without  regard  to  their  physical 
location. 

3.  Permit  communication  between  processes  without 
regard  for  their  physical  locations. 

The  first  goal  is  necessary  to  ensure  that  catastrophic 
failure  does  not  occur  to  the  system  if  one  processor  or 
process  fails.   The  second  two  goals  ease  the  job  of 
dynamic  reconfiguration  of  the  system. 

In  a  ring  structured  system  each  processor  in  the  net- 
work is  connected  to  a  unidirectional  high  speed  communica- 
tion ring  by  a  ring  interface  (see  FigureU-1).   Only  the 
processor  and  associated  ring  interface  in  control  of  the 
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FIGURE  II  -  1.   RING  COMMUNICATION  NETWORK 
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system  can  transmit  a  message.  Messages  are  directed  to  a 
process  by  name,  as  opposed  to  physical  location,  so  that 
where  a  particular  process  resides  in  the  system  is  not 
important  to  the  message  sender  (location  independence) . 

Message  transmission  is  accomplished  through  a  combina- 
tion of  hardware  and  software.   Transmitting  a  message  from 
one  process  to  another  causes  the  message  to  be  passed 
around  the  ring  from  station  to  station  and  to  be  copied 
into  the  processor  on  which  the  destination  process  resides 
by  that  processor's  ring  interface.   Each  ring  interface  has 
a  list  of  the  processes  resident  in  its  processor.  As  a 
message  passes  by,  a  ring  interface  compares  the  destina- 
tion process  name  in  the  message  with  the  list  of  process 
names,  copies  those  messages  for  which  there  is  a  match 
into  the  attached  processor,  and  sets  status  bits  at  the 
end  of  the  message  indicating  whether  or  not  the  message 
was  copied,  not  copied,  or  copied  incorrectly.  Notice, 
however,  that  while  a  ring  interface  is  receiving  a  message 
from  the  ring  and  delivering  it  to  the  host,  it  does  not 
remove  the  message  from  the  ring.   Instead,  it  merely 
copies  it,  one  byte  at  a  time.   This  means  that  the  message 
continues  around  the  ring  and  may  be  sent  to  more  than  one 
process  in  a  single  transmission  sequence.   This  accom- 
plishes the  second  goal,  communicating  between  processes 
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regardless  of  their  physical  location.  When  the  message 
finally  returns  to  the  sending  ring  interface  it  is  taken 
from  the  ring  and  the  status  bits  are  sent  to  its  processor 
to  be  checked  for  successful  reception  of  the  message. 

Only  the  ring  interface  in  control  of  the  system  can 
transmit  a  message.   Once  the  interface  in  control  has 
completed  its  message  transmission  and  receipt,  control  is 
passed  to  the  next  ring  interface.   In  this  manner  control 
is  distributed  among  all  ring  interfaces.   The  actual  con- 
trol of  the  ring  is  passed  around  by  means  of  a  "control 
token."  A  ring  interface  may  transmit  a  single  message 
only  when  it  possesses  the  control  token.   This  guarantees 
that  only  one  ring  interface  has  control  at  one  time.   Inde- 
pendent timers  in  the  ring  interfaces  ensure  that  no  one 
ring  interface  monopolizes  the  ring  and  if  control  is  lost, 
control  will  be  regained  by  one  of  the  other  ring  inter- 
faces.  The  important  thing  to  consider  is  that  all  ring 
interfaces  have  the  capability  to  take  control  of  the  ring 
and  that  only  one  ring  interface  will  have  control  at  a 
time  under  normal  operation. 

In  order  to  punctuate  the  continuous  flow  of  data  on 
the  ring,  two  additional  control  type  tokens  are  defined. 
The  Start  of  Message  (SOM)  token  is  used  to  tell  all  re- 
ceiving ring  interfaces  that  a  message  is  to  follow.   The 
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End  of  Message  (EOM)  token  tells  the  ring  interface  two 
things:  First,  it  tells  the  ring  interface  that  there  is 
no  more  information  to  relay  to  its  host.   Secondly,  it 
signals  the  CRC  (error  detecting)  circuitry  to  check  its 
remainder  for  a  transmission  error.   In  the  basic  format 
of  the  message  sent,  the  SOM  token  is  followed  by  the  name 
of  the  addressee,  followed  by  the  message  data.   At  the 
end  of  the  message  is  the  EOM  token  which  is  followed 
by  several  bits  used  to  check  for  proper  receipt  of  the 
message. 

B.   HARDWARE  COMPONENTS  OF  A  RING  STRUCTURED  SYSTEM 

Each  station  on  the  distributed  ring  structured  system 
is  subdivided  into  functional  hardware/ firmware  components, 
each  having  specific  responsibilities.   Figure  II-l  shows 
a  conceivable  ring  communication  configuration  where  a 
"station"  is  defined  as  a  host  processor  together  with  its 
ring  interface.   In  the  earlier  days  of  distributed  com- 
puting systems  the  ring  interface  and  its  associated  com- 
ponents were  frequently  hardwired.   Such  is  the  case  of  the 
ring  interface  developed  by  Farber  at  the  University  of 
California  at  Irvine  /Ref.  37.  At  the  Naval  Postgraduate 
School,  however,  a  modular  approach  was  taken  in  designing 
a  cost  effective  and  more  flexible  ring  interface  with 
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emphasis  on  firmware  replacing  hardware  wherever  possible. 
The  following,  therefore,  is  an  overview  of  the  functional 
responsibilities  of  the  key  hardware/ firmware  components 
of  the  ring  distributed  computing  system  as  envisioned  at 
the  Naval  Postgraduate  School  /Ref.  47.   Station  1  in 
Figure  II-l  shows  the  logical  subdivision  of  each  station 
into  functional  hardware/ firmware  components. 
1.   The  Repeater 

The  repeater  (r.  in  Figure  II-l)  provides  the  neces- 
sary signal  boost  to  drive  messages  over  long  cable  lengths. 
It  is  designed  to  be  directly  connected  to  the  ring,  recover 
the  messages,  recover  clocking  information,  and  pass  on 
cleaned  up,  reshaped  data  to  the  outbound  cable.   The 
physical  design  of  the  repeater  is  dependent  on  several 
characteristics  of  the  actual  system  hardware.   That  is, 
the  cable  length  between  stations,  the  cable  type,  the  types 
of  receiver/drivers  used,  and  ring  speed  each  in  its  own 
way  affects  repeater  design.   It  should  be  noted  that  the 
NPS  hardware/ firmware  design  physically  combines  the  re- 
peater with  the  ring  interface  module.   In  this  report  most 
references  to  the  "ring  interface"  include  the  characteris- 
tics and  functions  of  both  of  these  components. 
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2.  The  Ring  Interface 

In  Figure  II-l,  although  different  processors  are 
connected  to  the  ring,  the  functions  performed  by  each 
ring  interface  (RI)  is  to  be  the  same  at  all  stations: 

(a)  Data  and  control  tokens  traveling  along  the 
ring  are  to  be  received,  evaluated  and  re- 
transmitted. 

(b)  Certain  checking  functions  are  to  be  performed 
and  status  information  is  to  be  sent  to  the 
host  processor. 

(c)  Control  signals  from  the  host  processor  must 
be  acknowledged  and  complied  with. 

A  ring  interface  incorporates  all  these  functions  in 
the  most  efficient  manner  independent  of  any  host  processor. 
Each  host  processor  communicates  with  its  ring  interface  via 
a  device  which  adapts  the  general  purpose  ring  interface 
to  the  host  computer's  specific  needs.   The  module  performing 
this  role  is  the  Host  Adapter. 

3.  The  Host  Adapter 

The  Host  Adapter  (HA)  in  essence  acts  as  an  inter- 
preter between  the  host  processor  and  the  ring  interface. 
The  host  adapter  is  designed  for  a  particular  station  and 
as  such  is  dependent  on  the  host  processor  being  served. 
In  some  cases,  host  processors  can  be  linked  directly  to 
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the  ring  interface  with  the  host  adapter  functions  being 
accomplished  by  the  host  software. 

The  foregoing  has  been  a  general  discussion  and 
introduction  into  the  basic  hardware/ firmware  concepts  of 
the  ring  structured  distributed  system.   For  a  detailed 
discussion,  reference  should  be  made  to  Hirt  /Ref.  57, 
Meserve  /Ref.  97,  and  to  Harris  /Ref.  47 • 
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III.   SYSTEM  DESIGN 

A0   FAILURE  CONDITIONS  IN  RING  STRUCTURED  COMPUTING  SYSTEMS 

1.  The  Effects  of  Battle  Damage 

One  of  the  more  important  questions  being  addressed 
in  this  chapter  is  consideration  of  system  availability  for 
various  ring  system  configurations.   Controlled  degradation 
of  performance  under  failure  conditions  is  one  of  the  great- 
est potential  advantages  of  the  ring  structured  distributed 
computing^system.   Almost  all  previous  work  with  these  ring 
systems  has  dealt  with  the  single  ring  system  concept  which 
did  not  have  to  attempt  to  survive  the  most  obvious  failure 
modes  in  a  shipboard  military  environment,  that  of  battle 
and  collision  damage  and  component  failure.   Since  the 
probability  of  a  shipboard  ring  being  severed  by  battle 
damage  is  very  real,  it  has  to  be  considered  seriously.   A 
system  failure  caused  by  component  malfunction  has  to  be 
localized  and  then  repaired  or  bypassed  unless  there  is 
sufficient  redundancy  built  into  the  system. 

2 .  System  Failure  and  Maintainability 

Aside  from  the  effects  of  battle  damage,  the  impact 
of  system  maintenance  on  overall  system  availability  must 
be  considered.   Given  any  system  where  performance  requirements 
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are  met,  the  addition  of  capability  to  ensure  continuous 
reliable  performance  during  routine  and  emergency  mainten- 
ance situations  is  essential.  This  is  especially  true 
where  system  performance  criteria  require  a  certain  amount 
of  maintenance  actions  be  accomplished.   It  is  desirable  to 
continue  system  operation  even  during  maintenance  actions, 
when  some  components  will  be  out  of  service. 

A  ring  communication  system  would  be  particularly 
affected  by  the  choice  of  backup  capabilities  incorporated 
into  the  overall  system  design.   For  example,  consider  de- 
signing a  system  where  the  tradeoffs  are  increased  perform- 
ance at  a  cost  of  more  lengthy  repair  times.   If  failure  is 
very  rare,  then  repair  times  are  less  important.   On  the 
other  hand,  if  frequent  system  repair  is  anticipated,  the 
excessive  repair  times  with  their  associated  costs  and  sys- 
tem degradation  may  far  outweigh  performance  advantages. 

Where  battle  damage,  everyday  system  failures,  and 
maintenance  actions  occur,  it  is  clear  that  one  possible 
solution  to  sustained  performance  is  to  include  some  sort 
of  path  (cable)  redundancy  in  the  design  of  the  ring  to 
provide  alternate  message  paths  when  some  elements  fail. 
If  as  expected,  the  increase  in  redundancy  will  enhance 
system  availability,  then  the  increase  in  availability 
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will  also  directly  contribute  to  the  control  of  degradation 
of  performance  under  failure  or  maintenance  conditions. 

In  order  to  investigate  the  potential  of  increased 
redundancy  as  a  key  to  enhanced  ring  system  availability, 
the  following  sections  will  describe  in  detail  two  alterna- 
tives for  such  availability  enhancement  using  added 
transmission  paths. 

B.   THE  POTENTIAL  OF  REDUNDANCY  FOR  IMPROVING  SYSTEM 
AVAILABILITY 

1.   Protective  Redundancy 

Redundancy  (the  availability  of  more  than  one  means 
of  performing  a  function  when  fewer  are  required)  is  a 
principal  method  of  achieving  increased  reliability.   In 
the  two  alternative  methods  to  be  discussed  later  in  this 
thesis,  active  parallel  redundancy  will  be  considered  to 
improve  system  availability. 

Where  systems  are  operated  together  with  all  oper- 
able (unfailed)  systems  performing  the  function  all  the 
time,  a  system  is  said  to  be  active  parallel  redundant. 
An  aircraft  which  can  still  fly  when  two  of  its  four  engines 
fail  is  an  example  of  the  protection  provided  by  this  type 
redundancy. 
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2.   Application  of  Path  Redundancy  to  a  Distributed 
Ring  Computing  System 

The  potential  of  damage  or  disruption  to  a  shipboard 

ring  in  a  battle  environment  is  very  real  and  critical  to 

a  ship's  fighting  ability.   Once  a  ring  system  suffers  a 

severed  or  degraded  communication  path,  link,  or  repeater, 

the  following  conditions  must  be  met  to  continue  operation 

of  the  ring: 

(a)  The  condition  that  the  ring  has  been  cut  or 
degraded  must  be  detected. 

(b)  The  disruption  in  the  ring  must  be  located. 

(c)  The  disruption  must  be  repaired  or  bypassed. 
As  soon  as  the  ring  has  been  disrupted,  the  station 

immediately  after  the  disruption  will  be  aware  of  the  situa- 
tion.  If  the  ring  is  without  redundancy  the  system  will 
remain  inoperative  until  the  disruption  is  repaired.   It  is 
obvious  that  some  kind  of  path  redundancy  must  be  built  into 
the  ring  to  allow  for  the  localization  and/or  bypassing  of 
damage  to  ensure  a  continuity  of  operation.   The  following 
two  system  concepts,  Triple  Modular  Redundancy  (TMR)  and 
Triply  Redundant  Multiple  Paths  (TRMP) ,  provide  alternative 
approaches  to  the  problem  of  installing  path  redundancy  to 
the  distributed  ring  system.   Both  approaches  deliver  sig- 
nificantly better  availability  than  the  single  non-redundant 
ring. 
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C.   SPECIFIC  REDUNDANCY  SPECIFICATIONS 
1.   Triple  Modular  Redundancy  (TMR) 

The  TMR  approach  for  adding  redundant  paths  to 
the  distributed  computing  ring  system  described  in  Chapter 
II  is  the  most  direct  and  least  complex.   In  the  classical 
"N"  modular  redundancy  approach  to  the  ring  system  there 
are  some  number,  N,  of  complete  communication  paths  between 
stations  on  the  ring.   For  the  sake  of  clarity  and  to  reduce 
complexity  of  discussion,  the  Triple  Modular  Redundant  Case 
(N=3  paths)  will  be  discussed.   The  concept  of  three  complete 
paths  between  stations  is  analogous  to  having  three  complete 
data  rings  in  the  system  with  the  three  rings  going  through 
each  station.   Figure  III-l  illustrates  a  TMR  configuration 
for  a  ring  system  with  five  stations. 

In  TMR,  all  stations  would  listen  to  one  of  the  in- 
coming paths  until  such  time  that  an  error  is  detected.   If 
an  error  were  detected  while  a  message  is  being  transmitted, 
the  station  would  wait  for  a  retransmission  of  the  message 
on  the  same  path.   If  an  error  is  again  detected,  the 
listening  station  would  send  on  the  outbound  path  detected 
as  providing  erroneous  data,  a  high  priority  message  direct- 
ing each  station  to  switch  output  path  transmissions  to  a 
preselected  backup  path.   Once  this  switching  of  paths  has 
been  accomplished,  synchronization  procedures  must  be 
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completed.   These  specific  synchronization  procedures  which 
are  necessary  for  initializing  the  ring  are  discussed  in 
depth  by  Harris  /Ref.  47,  and  will  not  be  elaborated  here. 

Referring  to  Figure  III-l,  it  can  be  seen  that  the 
five  stations  are  represented  in  a  serial  arrangement,  each 
station  designated  by  S^  where  i  =  1,2,3,4,5.  The  stations 
themselves  are  divided  into  two  separate  modules;  the 
repeater  module  (r)  and  the  data  path  module  (p) . 

The  repeater  module  consists  of  the  ring  interface/ 
repeater  hardware  as  well  as  the  selector  switch  circuitry 
which  picks  which  of  the  three  incoming  data  paths  is  to  be 
used  as  primary  path  of  communication  to  the  host  computer. 
In  essence,  the  repeater  module  is  the  cornerstone  of  each 
station  in  TMR  since  failure  of  any  single  repeater  module 
causes  a  system  failure. 

Having  brought  up  the  topic  of  system  failure,  at 
this  time  it  would  be  advisable  to  state  specifically,  the 
conditions  for  an  operational  system;  a  system  (TMR,  TRMP) 
is  operational  if:  given  that  a  specific  station  is  oper- 
ating, a  transmission  path  exists  such  that  it  is  possible 
for  the  given  station  to  transmit  a  message  around  the  ring 
and  back  to  itself. 

It  can  be  seen  that  both  the  repeater  and  data  path 
modules  play  a  key  part  in  system  success.   The  data  path 


29 


module  (p)  is  composed  of  the  three  redundant  paths  of 
communication  between  successive  repeater  modules  (see 
paths  A.,  B.,  C.  in  Figure  III-l) .   Each  path  is  the  line 
of  communication  from  the  output  of  one  repeater  to  the 
input  of  the  following  repeater  module. 

For  the  purposes  of  calculating  system  reliability 
in  TMR,  each  repeater  module  (r)  and  each  data  path  module 
(p)  are  assigned  a  specific  availability  value.  Availability 
is  defined  as  the  ratio  of  the  time  that  the  system  is  oper- 
ational to  the  total  amount  of  time  that  it  is,  or  may  be, 
needed.   The  total  time  is  the  sum  of  the  usable  time  and 
downtime  for  maintenance/repair  as  shown  in  equation  3.1. 

Availability  =  .  "P*-1™ (3#1) 

uptime  +  downtime 

Downtime  is  the  result  of  a  number  of  maintenance /repair 
actions.   The  average  downtime  is  sometimes  called  the 
mean  time  to  repair  (MTTR)  and  it  includes  both  scheduled 
and  nonscheduled  downtimes.   Availability  can  also  be  ex- 
pressed as  the  ratio  of  the  mean  time  between  failures  (MTBF) 
divided  by  the  sum  of  the  MTBF  plus  the  MTTR  as  shown  in 
equation  3.2. 

Availability  -  ^^  (3.2) 
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The  key  to  the  relation  between  availability  and  reliability 
is  the  MTBF.   The  greater  the  reliability,  the  better  the 
availability,  given  that  MTTR  can  be  held  fairly  stable. 

For  the  purposes  of  system  availability  calculations, 
all  repeaters  are  assumed  identical  and  their  availabilities 
were  taken  as  equal.   Similarly,  the  availability  of  all 
paths  are  taken  as  equal  where  P(C^)=P(B.)=P(A^)=aL.   Re- 
peater module  availability  is  labeled  a  with  data  path 
availability  a  and  station  availability  a  .   Total  system 
availability  for  TMR  is  labeled  3^™. 

a.   TMR  AVAILABILITY  CALCULATION 

Referring  to  Figure  III-l,  it  can  be  seen  that 
TMR  System  availability  is  calculated  as  a  function  of  the 
repeater  module  availabilities  and  the  data  path  module 
availabilities.   Since  the  data  path  module  itself  is 
triply  parallel  redundant,  the  availability  of  the  data 
path  module  (a  )  is  caluclated  as: 

a^  =  l-(l-aL)3  (3.3) 

p  L 

The  availability  of  the  station  (a  )  is  the  product  of  the 

s 

availabilities  of  the  repeater  and  data  path  modules: 


a  =  a  *a  (3.4) 

s    r  p 


Having  knowledge  of  the  station  availability, 
the  overall  system  availability  for  TMR  (aT^)  can  now  be 
ascertained: 

where  N  =  the  number  of  stations  on  the  ring  system; 
For  example,  with  a,  =  .99,  a  ■  .98  and  N=5, 

aTMR  "   [ar(1-<1-aL>3)J  N  "  -904 

To  compare  the  TMR  System  availability  data  with 

a  similar  system  with  no  redundant  characteristics,  the 

following  calculations  are  presented: 

Given:   a  =  .98, 
r      ' 

aT  =  a  =  .99 
L    p 

N  =  5 
therefore,  station  availability  is: 

as  =  ar-ap  =  (.98) (.99)  =  .97 

Non-redundant  system  availability  is: 

anon-redundant  =  (-9™2)5  =  .86 

Although  TMR  enhances  system  availability,  a 
serious  potential  weakness  in  TMR  is  created  by  the  unique- 
ness of  the  repeater  module  (r) .   From  Figure  III-l  it  can 
be  seen  that  all  triply  redundant  data  path  modules  (p)  must 


interface  directly  with  station  repeater  modules.   If  one 
repeater  module  were  disabled,  all  data  paths  would  be 
interrupted. 

This  situation  could  be  alleviated  if  repeater 
modules  were  connected  in  different  sequences.  Methods 
could  be  devised  to  utilize  selected  alternate  paths  to 
bypass  defective  repeater  modules  without  interrupting  over- 
all system  operations.   The  Triply  Redundant  Multiple  Path 
(TRMP)  system  configuration  for  distributed  rings  provides 
a  possible  solution  to  this  situation. 

2.   Triply  Redundant  Multiple  Path  (TRMP)  System  Design 
Figure  III-2  shows  a  TRMP  system  of  five  stations, 
having  three  path  redundancy.   Only  one  of  the  three  out- 
going data  paths  from  each  repeater  module  links  with  the 
immediately  succeeding  repeater  module  (see  Figure  III-2, 
path  C  ,  for  example).   A  second  outbound  path  links  with 
the  second  successive  repeater  module,  skipping  past  one 
module  (Figure  III-2,  path  B~) .   The  third  outbound  path 
skips  the  first  and  second  modules /stations  and  links  with 
the  third  successive  module  (Figure  III-2,  path  A_) .   Note 
that  each  path  emanating  from  every  repeater  module  is  labeled 

A.  or  B  or  C  ,  where  i  is  the  number  of  repeater. 
i     i     i 

In  future  discussion  a  path  C  will  be  said  to  have 
a  logical  "distance"  of  one,  B  a  distance  of  two  and  A  a 
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distance  of  three,  referring  to  the  difference  between  the 
sequence  numbers  of  the  originating  and  terminating  stations. 
The  actual  physical  lengths  of  the  paths  may  have  any  rela- 
tionship, depending  on  the  placement  of  stations  in  the  ship. 

The  arrangement  of  data  paths  in  relation  to  station 
number  is  the  key  to  TRMP.  With  TRMP  it  is  possible  for 
two  consecutive  repeater  modules  to  be  inoperative  while 
still  having  the  system  working.   In  addition,  it  is  possible 
to  have  combinations  of  paths  and  repeaters  inoperable  while 
still  maintaining  an  operational  system  overall. 

In  a  shipboard  environment  with  TRMP  distributed 
ring  systems,  any  ship  could  sustain  damage  at  many  points 
throughout  the  ring  system  with  data  paths  and  repeater 
modules  alike  being  rendered  inoperative.   It  must  be 
pointed  out,  however,  that  it  could  be  possible  to  disrupt 
all  system  operation  in  a  battle  damage  environment  through 
selective  disabling  of  repeater  modules  and  data  paths. 
However,  the  flexibility  of  TRMP  in  coping  with  potential 
damage  and/or  maintenance  requirements  presents  significant 
advantages  over  systems  with  little  or  no  redundancy  or 
even  with  those  having  TMR  characteristics.   This  fact  will 
be  borne  out  in  later  studies  of  availability  calculations. 

Because  of  the  unique  arrangement  of  paths  in  TRMP, 
only  certain  numbers  of  stations  may  be  included  in  the 
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overall  system  (stations  in  this  sense  refer  only  to  the 
ring  repeater  module  connections  on  the  ring) .   If  the 
number  of  stations  were  even,  a  complete  circuit  of  dis- 
tance-two links  could  be  formed  which  locks  out  half  the 
stations.   Similarly,  if  the  number  of  stations  were  a 
multiple  of  three,  a  complete  circuit  of  distance- three 
links  could  be  formed  which  locks  out  two  thirds  of  the 
stations.   See  Figures  III-3  and  III-4  for  illustration. 

The  limitation  on  TRMP  then,  is  that  the  number  of 
stations  on  the  entire  ring  system  cannot  be  divisible  by 
two  or  three.   In  actual  applications,  dummy  stations  could 
be  inserted  into  the  ring  to  ensure  that  this  limiting 
condition  is  met. 

a.   TRMP  OPERATING  SEQUENCE 

In  the  TRMP  system  each  station  utilizes  a 
switch  to  select  which  of  the  three  paths  will  be  monitored 
for  the  incoming  signal.   Outbound  data  is  transmitted  on 
all  three  outgoing  paths.   If  a  station  receives  degraded 
or  disrupted  signals  on  the  path  being  monitored,  it  waits 
until  it  has  timed  out  or  received  a  second  unsuccessful 
transmission  on  that  path  and  then  switches  to  an  alternate 
path  and  repeats  the  sequence.   If  all  three  input  paths  to 
a  particular  station  are  disrupted,  that  station  receives  no 
further  messages  and  outputs  a  priority  message  to  the  ring 
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FIGURE  III  -  3.   TRMP  WITH  A  MULTIPLE  OF  TWO  STATIONS 


FIGURE  III  -  4.   TRMP  WITH  A  MULTIPLE  OF  THREE  STATIONS 
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via  its  three  outgoing  paths.   This  message  indicates  the 
condition,  thus  alerting  the  system  users  that  one  particu- 
lar station  is  unable  to  receive  or  transmit  any  further 
messages.   In  TRMP,  while  the  station  in  question  is  cut 
off  from  the  system,  the  system  continues  to  function  be- 
cause two  alternate  paths  which  bypass  the  inoperative 
station  allow  a  continuity  of  message  transmission  around 
the  ring. 

b.   TRMP  AVAILABILITY  CALCULATION  DEVELOPMENT 

The  calculation  of  system  availability  for  TRMP 
is  not  nearly  as  straight  forward  as  TMR.   From  Figure  III-2 
it  will  be  seen  that  TRMP  is  organized  with  transmission 
device  interfaces /repeaters  represented  by  circles  labeled 
r.  and  data  paths  represented  by  directed  arcs  labeled  A. 

or  B.  or  C..   All  devices  are  identical  in  construction  and 
i     1 

all  data  paths  are  identical  as  well.  A  message  sequence 
begins  by  specifying  which  of  the  devices  and  paths  are 
operational  and  which  are  not.   Therefore,  the  first  question 
that  must  be  addressed  is,  what  is  the  probability  that  at 
least  one  station  on  the  ring  is  operational  (a  station  in 
this  sense  being  a  combination  of  host  computer  and  ring 
repeater) .   This  event  will  be  called  S-  and  its  probability 
is  found  as  follows : 
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The  probability  that  a  given  station  is  opera- 
tional can  be  expressed  as  one  minus  the  probability  that 
the  station  is  inoperative. 


A 
Station 
UP 


=  1-P 


A 
Station 
DOWN 


(3.6) 


\  In  a  distributed  ring  system  where  station 

failures  are  independent,  where  individual  stations  are 
linked  in  a  serial  manner,  and  where  each  station  in  the 
ring  has  the  same  probability  of  being  operational,  the 
probability  that  every  station  in  the  system  is  not 
functioning  is : 


All  Stations 
in  System 
Inoperative 


=  P 


Any 

Station 

Down 


(3.7) 


where  N  is  the  total  number  of  stations  on  the  ring. 

It  follows  that  the  probability  of  at  least  one 
station  in  the  system  being  operational  is  one  minus  the 
probability  of  all  stations  in  the  system  being  inoperative. 


p(s1)=p 


At  least  one 
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Operational 


=  1-P 


All  Stations 
in  System 
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=  1-P 
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Station 
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(3.8) 
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In  calculating  the  probability  that  a  given 
message  will  successfully  transit  the  ring,  some  ground 
rules  have  to  be  established.   First,  each  device  (ring 
repeater)  functions  throughout  the  message  sequence  with 
probability  p  independent  of  all  other  devices.   Similarly, 
each  data  path  functions  successfully  for  a  message  sequence 
with  individual,  independent  probability  q.   Thus,  for  N 
devices  and  associated  data  paths,  a  message  sequence  begins 
by  creation  of  a  message  by  a  functioning  device.   Each 
other  device  may  or  may  not  be  functioning.   The  message 
proceeds  by  passing  over  functioning  data  paths  and  through 
functioning  repeaters  until  the  message  succeeds  in  again 
reaching  its  creating  device. 

The  problem  is  to  determine  the  probability  of 
a  system  such  that  any  given  message  sequence  will  succeed 
with  specified  probabilities  in  p  and  q  and  system  size  N. 

In  Figure  III-5,  which  is  a  linear  representation 
of  TRMP,  data  paths  and  repeater  modules  have  been  labeled 
to  facilitate  the  calculation  of  system  availability  which 
will  follow.   To  simplify  the  overall  concepts  of  TRMP, 
the  linear  description  previously  seen  is  modified  in 
Figure  III-5  by  adopting  directed  arcs  in  lieu  of  devices. 
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As  shown  in  Figure  III-6,  transmission  through  the  indi- 
vidual station  (d.)  is  taken  as  a  pair  of  events,  reception 
(r.)  and  transmission  (t.). 


d. 

1 


Figure  III -6.   TRMP  Event  Structure 

The  calculation  of  the  probability  of  system 
success  will  use  a  recursive  technique  beginning  at  the 
origin  of  the  message  (at  the  left  in  the  directed  arc  dia- 
gram, Figure  III-5)  and  proceeding  for  each  event  to  define 
the  characterization  of  success.   It  can  be  seen  that  with 
a  triplicated  path  structure  it  would  be  possible  for  a 
message  to  make  three  complete  passes  around  the  ring  prior 
to  achieving  a  successful  transit.   Therefore  a  characteri- 
zation of  success  will  be  calculated  for  three  passes  of 
the  ring. 
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In  the  derivation  shown  in  Table  III-l,  based 
on  a  system  depicted  in  Figures  III-5  and  III-6,  station  5 
will  initiate  the  original  message. 

In  Table  III-l  each  event  is  doubly  subscripted 
(r.-  or  t..).   The  subscript  i  represents  the  specific  event 
number  and  will  vary  up  to  the  maximum  number  of  stations  on 
the  ring.   The  subscript  j  indicates  on  which  of  the  three 
passes  an  event  is  occurring.   For  example,  r   is  the  re- 
ception at  station  3  on  pass  1  around  the  ring  and  t~o  is 
the  transmission  from  station  3  on  pass  3  around  the  ring. 

Referring  to  Table  III-l,  it  can  be  seen  in  the 

originating  pass,  that  successful  reception  at  station  1, 

(r..-),  of  a  message  originating  at  station  5  is  defined  as 

the  event,  link  C.  is  up.   The  message  is  transmitted  from 

station  1  if  it  is  received  (r....)  and  passes  through  the 

repeater  (d  )  : 
i 

t,,  =  r-^A  d^  (written  r^d^) 

Correct  reception  at  station  2  can  come  via  the 
distance-two  link  from  station  5,  (B  ) ,  or  via  the  distance- 
one  link  through  station  1,  (C^t-, ,)  : 


r21  ~  al  v  c2tll 
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From  the  calculations  in  Table  III-l,  a  generic 
recurrence  in  each  of  the  three  passes  through  the  system 
after  event  r,    is  found  to  be: 

rij  ■  Ai-2  H-3.J  v  Bi-j  H-2,j   v  ci*i-l,j 

tlfj  =  rlfj    dt  (3.9) 

This  recurrence  is  relatively  easy  to  use  for 

computing  system  availability  by  substituting  repeater 

availabilities  for  d.  and  data  path  availabilities  for  A., 

i  1 

B.,  C..   This  process  will  be  discussed  more  fully  later  in 
this  section.   The  main  point  to  remember  at  this  juncture, 
however,  is  that  if  the  above  concept  is  applied  to  a  com- 
puter program,  the  values  of  r.  and  t.  are  easily  stored 
as  linear  arrays  and  the  computational  effort  increases 
linearly  with  N. 

Combination  of  the  probability  of  success  for 
each  of  the  three  passes  will  give  the  probability  that  a 
successful  message  sequence  will  occur  on  the  originating 
pass  or  the  second  pass  or  the  third  pass.   This  success  is 
expressed  as: 


S2  "  rNl  v  rN2  v  rN3 
where  N  =  Maximum  number  of  stations  in  the  system. 


45 


In  terms  of  probability  this  same  formula  is  expressed  as: 
P(S2)  =  l-(l-rN1)(l-rN2)(l-rN3)  (3.10) 

Having  previously  calculated  the  probability 
that  at  least  one  station  is  operational,  P(S..),  and  the 
probability  of  a  successful  message  sequence  P(S  ) ,  the 
overall  probability  of  system  success  or  availability  of 
the  TRMP  System  will  be  expressed  as  the  product  of  the  two 
named  probabilities,  or: 

a^^p  =  P(S1)-P(S2)  (3.11) 

c.   TRMP  EVENT  CHARACTERIZATIONS  AS  PROBABILITY 
EXPRESSIONS 

After  development  of  event  characterizations  for 
three  system  passes,  it  is  necessary  to  calculate  the  system 
probability  of  success. 

The  probability  of  any  given  successful  event 
from  Table  III-l  is  expressed  in  terms  of  the  probabilities 
that  certain  components  are  available,  and  the  probabilities 
of  preceeding  events.   The  availabilities  of  all  paths  are 
assumed  equal:   P(A.)  =  P(B.)  =  P(C.)  =  a,  (for  all  i)  .   Like- 

J.  X  J.  Li 

wise,  the  probability  that  each  repeater  is  functioning  is 

assumed  the  same  for  each  station:   P(d.)  =  a  .   For  example, 

i/    r  r   » 

the  event  characterization  r..  1  is  expressed  as  r,  ,=C  .   The 
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probability  of  event  r, ,  is  P(r.1)=P(C  )=a  .   Likewise  the 
probability  of  event  d.  is  P(d.)=a  . 

Paths  and  repeaters  are  assumed  to  fail  inde- 
pendently so  that, 

P(tu)  =  P(diril)  =  PCdjTPCr^  =  aA 

The  events  that  "the  repeater  is  operational" 
and  "the  path  is  operational"  are  not  mutually  exclusive 
events  however,  so  the  probability  of  event  ^21=A1  v  <^2tll 
is: 

P(r21)  =  P(AX  v  C2tu)  =  l-(l-P(A1))(l-P(C2t11)) 

=  l-(l-aL)(l-a^  ar) 

2       3 
'  aL    L  ar  "  aL  ar 

Thus  the  probabilities  of  each  of  the  successive  event 
characterizations  in  Table  III-l  can  be  expressed  in  terms 


of  a_  and  aT  : 

r  Li 


hi   =  V21   =P(t21}    =P(d2)*P(r21> 

=  ar(aL  +  aL  ar   "   aL  aP 

2     2  3     2 

=  a  a,    +  aT    a     -  aT    a 

r  L  L      r  L     r 


Proceeding  similarly, 


r41  =  C4t31  v  B3t21  v  A2fcll 
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so   that, 

P(r41)    =  l-(l-P(C4t31))(l-(l-(l-P(B3t21))(l-P(A2t11)))) 

,-      3   2,2      ,53     63. 
( l-aT  a  +aT  a  +aT  a  -aT  a  ) 
Lr     Lr     Lr     Lr 

At  this  point  the  development  of  this  event 
characterization  was  abandoned  due  to  the  obvious  complexity 
which  had  developed  and  which  was  increasing  with  each  event. 
A  simple  and  straightforward  computer  program  was 
written  to  calculate  the  probabilities  of  the  event  character- 
izations in  Table  III-l.   In  each  pass,  the  first  few  events 
differ,  but  most  of  the  calculations  are  repeated  applica- 
tions of  Equation  3.9.   The  program  was  written  in  Fortran 
IV  and  used  double  precision  floating  point  variables.   The 
results  of  this  program  are  tabulated  in  the  following  sec- 
tion in  the  format  of  system  availability  curves  illustrating 
TRMP  system  availability  as  a  function  of  system  size,  re- 
peater availability  and  data  path  availability. 
3.   System  Availability  Comparisons 

The  data  projected  in  Figure  III-7  illustrates  the 
distinctive  advantage  that  TRMP  has  over  TMR  or  non-redundant 
systems.   One  of  the  more  important  aspects  that  should  be 
noted  is  that  the  TRMP  configuration  can  deliver  very  high 
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availability  (approximating  .99)  even  when  repeater  and  data 
path  availability  is  in  the  range  of  0.8.  Additionally,  it 
should  be  noted  that  a  TRMP  configuration  of  101  stations 
is  only  slightly  different  from  the  TRMP  system  of  11  sta- 
tions and  for  this  larger  number  of  stations,  TRMP  availa- 
bility far  exceeds  that  of  TMR  or  non-redundant  systems  at 
all  values  of  repeater/data  path  availability. 

Figure  III-8  graphically  illustrates  the  TRMP  system 
probability  of  failure  (1  minus  the  availability)  for  systems 
of  varied  size  and  repeater/data  path  availabilities  exceed- 
ing 0.9.   System  probability  of  failure  is  graphed  instead 
of  system  availability  because  at  repeater  and  data  path 
availabilities  exceeding  0.9,  the  system  availability  be- 
comes close  to  1.   Here  again  the  relative  closeness  of  a 
TRMP  system  of  101  stations  to  the  system  of  11  stations 
illustrates  how  a  larger  number  of  stations  do  not  signifi- 
cantly affect  the  high  overall  system  availability  for  TRMP. 
It  should  be  noted  that  TMR  and  non-redundant  configurations 
were  not  plotted  as  their  availability  curves  would  appear 
as  nearly  vertical  lines  on  the  extreme  right  edge  of  the 
figure. 

4.   Shipboard  Data  Multiplex  System 

A  system  currently  under  evaluation  for  the  U.S„ 
Naval  Sea  Systems  Command,  called  Shipboard  Data  Multiplex 
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System  (SDMS)  is  similar  in  many  respects  to  the  Distributed 
Ring  Computing  System  with  TRMP  (DRCS/TRMP).   Intended  to 
be  a  general  purpose  data  transfer  system  for  shipboard 
use,  SDMS  has  a  modular  redundant  design.   In  SDMS  there  are 
five  separate  bus  lines  (data  paths)  for  a  redundant  data 
path  transmission  capability,  whereas  DRCS/TRMP  utilizes 
three  way  redundant  paths .  While  the  two  approaches  are 
similar  in  concept  they  are  far  different  in  application. 
In  the  SDMS  configuration,  redundant  data  paths  are  both 
time-division  and  frequency-division  multiplexed  whereas 
distributed  ring  systems  have  a  single,  time  shared  digital 
communication  stream.  Another  aspect  of  path  redundancy 
where  SDMS  and  DRCS/TRMP  differ  is  in  the  manner  and  sequence 
in  which  stations  are  interconnected. 

One  area  where  the  systems  differ  little  is  that 
both  systems  have  totally  asynchronous  and  distributed  con- 
trol of  message  traffic.   Just  as  in  DRCS/TRMP,  SDMS  has  no 
central  processing  device  which  by  failing  can  interrupt 
data  transfer.   It  has  been  designed  so  that  failure  modes 
result  in  a  gradual  and  graceful  degradation  of  system 
availability  rather  than  a  sudden  loss  of  information 
transfer. 

Regardless  of  their  differences,  however,  each  sys- 
tem does  provide  a  higher  degree  of  reliability  and 
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survivability  in  the  information  transfer  function  than  do 
current  shipboard  systems.   In  comparing  the  two  systems 
for  overall  availability,  it  is  found  that  both  function 
with  very  high  levels  of  availability.   DRCS/TRMP  availa- 
bility approximates  .9999999  when  data  path  and  ring  repeater 
availabilities  are  greater  than  .95.  When  repeater  availa- 
bility is  increased  to  .999  the  probability  of  system 
failure  approaches  10    .   Repeater  availabilities  approxi- 
mating .999  are  reasonable  as  will  be  shown  in  Chapter  IV. 
SDMS,  on  the  other  hand,  produces  an  availability  in  the 
vicinity  of  .9999999999.   In  SDMS,  system  availability  is 
a  measure  of  the  overall  system's  ability  to  process 
90  per  cent  of  the  normal  rated  traffic  demand  expected 
of  the  system. 

For  information  on  the  specific  system  design 
specifications  and  operational  capabilities  of  SDMS,  the 
interested  reader  should  consult  reference  14. 

With  a  system  based  on  low-risk  electronic  tech- 
nology and  with  many  other  design  principles  of  DRCS/TRMP, 
SDMS  is  worth  further  investigation  as  one  offshoot  of  dis- 
tributed system  technology.   Its  emergence  as  a  viable  infor- 
mation transfer  system  in  a  shipboard  environment  indicates 
that  great  potential  exists  for  redundant  distributed 
systems. 
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IV.   PROPOSED  RING  REPEATER  DESIGN 

A.   FUNCTIONAL  ANALYSIS  OF  PROPOSED  RING  REPEATER  DESIGN 

The  ring  repeater  discussed  throughout  this  thesis  is 
designed  to  connect  directly  to  the  incoming  ring  cable, 
receive  the  signal,  recover  clocking  information,  and  pass 
on  reshaped  (and  possibly  retimed)  data  to  the  outbound 
cable.   To  design  the  repeater,  then,  one  must  know  what 
type  of  cable  is  to  be  used,  what  transmission  distances 
are  required,  what  type  of  driver/ receivers  are  to  be  used 
and  what  transmission  speeds  are  used. 

A  repeater  designed  to  carry  out  the  functions  of  TRMP 
was  derived  mainly  from  a  repeater  design  discussed  by 
Harris  /Ref.  4,  p.  8Q7  and  is  diagrammed  in  Figure  IV- 1. 
Since  most  developmental  work  and  testing  was  conducted 
by  Harris  no  such  efforts  were  conducted  during  this  study. 
For  the  purposes  of  this  thesis,  the  basic  design  was 
analyzed  from  a  component  availability  perspective  in  an 
effort  to  develop  the  data  required  for  overall  system 
availability. 

The  function  and  purposes  of  the  various  components  of 
Figure  IV-1  are  now  briefly  discussed. 
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Ring  Cable 

Only  a  single  high  speed  bitstream  is  transmitted  in 
this  self-clocking  ring  system.   This  would  suggest  the  use 
of  single  wire,  coaxial  cable,  twisted  pair  or  fiber  optic 
transmission  media.   Of  these  four,  only  the  last  three  are 
worth  considering  due  to  their  high  immunity  to  the  problems 
of  electromagnetic  interference  in  the  shipboard  environment. 
All  three  of  these  transmission  media  are  capable  of  trans- 
mitting data  at  the  speed  required  on  board  ships  and  all 
possess  very  low  failure  rates. 

Data  Path  Selection  Switch 

The  purpose  of  this  switch  is  simply  to  select  the  desired 
data  path  based  on  the  quality  of  messages  received.   It  is 
basically  a  3:1  multiplexer  which  accepts  twisted  pair  inputs 
and  outputs  to  a  bypass  relay.   Data  path  selection  is  deter- 
mined by  the  ring  interface. 

Bypass  Relays 

The  purpose  of  the  bypass  relays  is  to  simply  switch 
the  repeater  out  of  the  ring  in  case  of  power  failure  or 
for  repeater  maintenance. 
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Line  Drivers  and  Receivers 

Integrated  circuit  line  drivers  and  receivers  are 
readily  available  from  many  sources.   The  receiver  accepts 
twisted  pair  inputs  and  provides  a  TTL  output  to  interface 
with  standard  logic  circuits.   The  driver  accepts  a  TTL 
input  and  transmits  to  twisted  pair  cable. 

Recent  advancements  in  optoelectronics  have  produced 
optically  isolated  receivers  with  very  high  immunity  to 
noise.   While  earlier  models  were  restricted  to  lower 
data  rates,  recent  models  now  are  capable  of  megabit  speeds 
and  are  relatively  inexpensive.   These  optically  isolated 
receivers  are  compatible  with  many  of  the  differential 
drivers  now  on  the  market.   With  the  increased  interest  in 
fiber  optic  technology  this  option  may  prove  to  be  the  best 
way  to  proceed  in  future  designs. 

One  Bit  Delay 

The  one  bit  delay  is  a  single  flip  flop  driver  at  the 
recovered  clock  rate  and  serves  to  retrieve  the  received 
signal  before  retransmission. 

Clock  Recovery  Unit 

The  recovery  unit  recovers  clocking  information  from 
the  incoming  data  stream.   The  bitstream  in  the  ring  is 
self -clocking  in  that  frequent  one-zero  (and  zero-one) 


57 


signals  are  guaranteed  during  incoming  messages.   Clock 
pulses  can  be  regenerated  at  these  transitions.   Data  bits 
on  the  ring  are  sent  in  2  bit  periods  as  shown: 


M^^^U 


one"  bit    :  '110 


j 


"zero"  bit   :  i  0  1  1 


i 


During  receipt  of  the  tokens  (CTL,  SOM,  EOM)  up  to 
three  bit  periods  may  pass  with  no  transitions.   The  clock 
recovery  circuitry  provides  the  sufficient  "inertia"  to 
continue  with  minimal  frequency  drift  during  these  periods. 

Crystal  Clock 

The  3.58  MHZ  "TV"  Crystal,  crystal  clock  mechanism  used 
in  this  repeater  design  is  very  stable  and  rather  inexpen- 
sive. With  proper  division  it  would  be  used  to  provide  a 
reference  for  the  digital  phase-locking  in  the  clock 
recovery  unit. 

Output  Multiplexer 

A  two-to-one  multiplexer  is  used  to  route  ring  data  from 
the  delay  flip-flop  to  the  ring  or  from  the  ring  interface 
to  the  ring.   The  multiplexed  path  is  controlled  by  the 
connect /disconnect  line  from  the  ring  interface.   Note  that 
the  ring  interface  in  Figure  IV- 1  "listens"  to  the  passing 
data,  watching  for  a  CTL  token,  before  entering  the  ring 

58 


(switching  the  multiplexer  to  "connect  mode").   In  this 
manner  the  ring  data  in  line  is  always  valid  (when  the 
repeater  is  not  bypassed)  and  is  derived  from  the  output 
of  the  delay  flip-flop. 

1,   Availability  Analysis  of  Proposed  Ring  Repeater 
Components 

Throughout  earlier  system  availability  discussions 
in  this  thesis,  the  availability  of  the  ring  repeater  has 
always  been  a  central  or  key  element  to  all  calculations. 
Whether  TMR  or  TRMP,  the  degree  of  availability  of  the 
ring  repeater  had  significant  impact  on  the  overall  system 
availability.   Realizing  the  importance  of  the  repeater  in 
availability  calculations,  it  is  vital,  when  designing  a 
repeater  configuration,  to  design  a  unit  that  would  provide 
the  desired  service  and  also  provide  that  service  as  much 
of  the  time  as  possible. 

As  can  be  gathered  from  previous  discussions  on  the 
subject,  reliability  is  the  problem  at  all  levels  of  elec- 
tronics, from  materials  to  operating  systems.   Reliability 
of  materials  is  an  "involved"  topic  which  will  not  be  dis- 
cussed here  but  it  is  sufficient  to  say  that  in  reliability 
of  operating  systems,  it  is  often  found  that  system  relia- 
bilities have  to  be  known  prior  to  system  use.   For  example, 
it  can  be  seen  that  the  reliability  of  an  electronic  component 
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is  known  with  certainty  after  it  has  been  used  in  the  field 
until  it  is  worn  out  and  its  failure  history  has  been  re- 
corded.  But  this  approach  has  not  proven  very  reasonable 
in  situations  where  the  demands  of  technology  required 
immediate  use  of  electronic  components.   There  was  not  time 
to  run  life  cycle  evaluation  on  the  component  to  judge 
its  reliability. 

There  is  then  a  need  to  be  able  to  predict  the  relia- 
bility of  a  component  to  a  very  accurate  degree.   A  funda- 
mental limitation  on  this  prediction  process  is  the  ability 
to  accumulate  data  of  known  validity.   Another  limitation 
is  exerted  by  the  type  of  prediction  technique  used.   Very 
simple  techniques  omit  a  great  deal  of  detail  and  the  predic- 
tion suffers  inaccuracies.   More  detailed  techniques  can 
become  so  bogged  down  in  detail  that  the  prediction  effort 
becomes  too  costly  or  worse,  causes  delay  in  actual  hardware 
development. 

For  the  military  application  there  is  a  set  of 
definitive  guidelines  to  reliability  prediction  for  elec- 
tronic components.   These  guidelines,  found  in  reference  13, 
take  into  account  the  specific  types  of  electronic  components 
used  throughout  the  military  and  utilize  the  most  acceptable 
prediction  techniques  available. 
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The  reliability  prediction  formula  used  in  the 
following  pages  comes  from  reference  13  and  reflects  the 
overall  device  (component)  failure  rate  for  MOS  Digital 
SSI/MSI  devices.   From  component  failure  rates,  it  will  be 
possible  through  standard  availability  calculation  methods 
to  eventually  ascertain  whether  the  design  will  meet  re- 
quirements for  reliable  performance  in  the  TRMP  system.  A 
failure  rate  will  be  calculated  for  each  component  in  the 
proposed  ring  repeater  design.   These  failure  rates  will 
then  be  transformed  into  component  availabilities.   With 
these  individual  availabilities,  the  overall  ring  repeater 
availability  will  be  calculated. 

a.   REPEATER  COMPONENT  FAILURE  RATE  AND  AVAILABILITY 
CALCULATIONS 

For  MOS  Digital  SSI /MSI  devices  the  individual 

device  failure  rate  (  A«  )  is  specified  by: 

K=   ^L^^-rC^)  (4.1) 

where : 

Xp    -   is  the  device  failure  rate  in  failures  per  10 

hours , 
TT.     -  is  the  device  learning  factor.   It  applies  values 
based  on  whether  device  technology  is  new  or  time 
tested  (new  is  usually  considered  to  be  less  than 
6  months  old)  . 

61 


TTq  -  is  the  device  quality  factor.  Depending  on 
which  MIL-SPEC  standard,  if  any,  is  applied 
to  the  device  it  assigns  an  appropriate  value. 

77   -  is  the  temperature  acceleration  factor.   Its 
value  depends  on  device  technology  as  well  as 
the  expected  temperature  range  in  the  environ- 
ment in  which  the  device  will  be  located. 

JT      -  is  the  application  environment  multiplier.   Its 
values  are  determined  by  the  application  environ- 
ment in  which  the  device  will  operate. 
C^,C«   -  are  the  circuit  complexity  factors.   Values  are 
assigned  based  on  the  number  of  gates  within 
the  device. 

In  using  the  formula  for  Ap  >  the  individual  parameter 
values  are  determined  from  appropriate  tables  found  in 
reference  13. 

In  Table  IV-1  the  failure  rate  and  availability 
for  each  individual  component  is  calculated. 

For  all  calculations  in  this  table^T/l  (device 
quality)  was  set  at  a  value  of  150  (commercial  or  non-MIL- 
SPEC  parts)  with  no  screening  beyond  the  manufacturers 
regular  quality  assurance.  77^  (learning  factor)  was  set 
at  1  since  any  device  in  production  for  more  than  six  months 
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received  this  rating.  77^  (environment  factor)  was  set  at 
5.0  to  conform  with  a  Naval  unsheltered  environment  which 
would  be  the  worst  case  encountered  on  a  shipboard  platform, 
7TT   (temperature  acceleration  factor)  was  set  at  .25  to 
conform  with  an  expected  ambient  temperature  of  90°F,  again 
a  worst  case  situation.   Circuit  complexity  factors  (C,  and 
C«)  were  a  function  of  the  individual  number  of  gates  per 
device  and  as  such  varied  from  device  to  device. 

To  illustrate  the  calculation  process,  take  for 
example  the  line  receiver.   Signetics  Corporation  device 
DM  8820  was  chosen  as  a  representative  component  for  this 
calculation.  Device  DM  8820  has  2  gates  which  equates  to 
complexity  factors  of  C  =  .0021  and  C  =  .0050.  All  other 
variables  being  as  stated  previously,  equation  4.1  results 
in  Ap  =  3.82875  failures/ 10  hours.   To  find  the  number  of 
failures  per  hours  (  \   ) ,  Ap  is  multiplied  by  10   .   MTBF 
is  then  found  by  taking  the  inverse  of  A  .   Having  calcu- 
lated component  MTBF  it  is  essential  to  translate  this  in- 
formation into  an  availability  value.   The  equation 
originally  considered  in  determining  component  availability 
was  of  the  form: 

Availability  =  »_ -_ — ,  >XTTirT1T,  (4.2) 

J        MTBF  +  MTTR 
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MTBF 
Direct  calculation  of  the  form  A  = 


MTBF  +  MTTR 
is  quite  simple.   When  MTTR  «  MTBF,  an  approximation  is 

more  practical: 

MTBF      /-*_   !  _  MTTR        .  ..    ^ 


MTBF  +  MTTR  ^  MTBF 

In  the  case  of  the  proposed  ring  interface  design, 
the  MTTR  is  expected  to  approximate  one  hour.   This  means, 
of  course,  that  MTTR  is  significantly  less  than  MTBF,  so 
the  approximation  form  is  used  in  all  calculations  for 
component  availability. 

To  continue  with  the  actual  calculations  of 
component  availability,  equation  4.3  is  applied  with  MTTR  = 
1  hour.   Device  availability  then,  is  found  to  be  .99996. 
What  remains  is  to  calculate  the  overall  availability  of 
the  ring  repeater  using  all  the  individual  component  avail- 
abilities generated  in  Table  IV-1. 

b.   CALCULATION  OF  THE  PROPOSED  RING  REPEATER 
AVAILABILITY 

From  Figure  IV-1  it  can  be  seen  that  repeater 
availability  is,  in  fact,  a  serial  calculation.   Having 
calculated  individual  component  availabilities  it  is  neces- 
sary to  first  calculate  the  clock  recovery  unit  availability 
from  its  components  and  then  calculate  the  parallel  avail- 
ability of  the  two  data  path  selector  devices. 
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The  clock  recovery  unit  availability  is  a  serial 
calculation  which  results  in  a  unit  availability  of  .99979. 
The  data  path  selector  device  has  an  individual  availability 
of  .99992.   The  availability  of  the  parallel  combination  of 
those  devices  is  .999999993.  Applying  these  two  availability 
values  as  well  as  the  remaining  component  availability  values 
in  Table  III-l  to  Figure  IV-1,  the  repeater  availability  is 
.99933. 

2.   Increasing  Repeater  Availability 

While  the  calculated  availability  for  the  proposed 
ring  repeater  is  .99933,  this  value  in  practice  is  not  as 
high  as  would  be  desired.   There  are  of  course,  several 
alternative  approaches  which  can  be  used  to  bolster  re- 
peater availability.  First  would  be  to  consider  components 
of  high  reliability  as  reflected  in  higher  levels  of  the 
Military  Specification  (MIL-SPEC)  program.   It  should  be 
noted  that  components  utilized  in  calculation  of  the  pro- 
posed repeater  availability  were  of  commercial,  non-MIL- 
SPEC  quality.   To  illustrate,  consider  the  line  receiver 
device  (DM  8820)  used  in  the  proposed  repeater  design. 
Using  a  commercial  configuration,  it  can  be  seen  from 
Table  III-l  that  device  availability  is  .99996.   Now, 
utilizing  an  otherwise  similar  device  satisfying  the  most 
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rigid  MIL-SPEC  standards,  formula  4.1  yields  an  availability 
of  .9999997. 

A  second  way  to  increase  repeater  availability 
would  be  to  make  the  entire  design  more  redundant.   That  is, 
instead  of  single  devices  (as  presently  proposed) ,  install 
all  devices  in  parallel.  Again  to  illustrate  the  effects, 
the  existing  proposed  design  with  only  the  selector  device 
in  a  redundant  configuration,  has  a  unit  availability  of 
.99933.   By  installing  every  device  in  parallel  using  the 
same  commercial  quality  devices,  the  resultant  repeater 
availability  would  be  .9999948. 

Further  calculations  are  not  necessary  to  conclude 
that  even  higher  availabilities  could  be  achieved  through 
combination  of  more  rigid  MIL-SPEC  criteria  and  more 
redundant  configurations . 

One  last  option  for  enhancement  of  repeater  avail- 
ability would  be  the  development  of  a  single  integrated 
circuit  device  which  would  accomplish  the  functions  of  the 
repeater.   This  approach  would  significantly  reduce  the 
number  of  gates  as  well  as  the  complexity  factors  and  as  a 
result  would  increase  repeater  availability.  Any  attempt 
to  predict  an  availability  for  such  a  device  would  be  pre- 
liminary as  the  final  number  of  gates  (complexity)  would  be 
subject  to  final  design  specifications  of  the  manufacturer. 


67 


In  summary,  while  the  availability  for  the  proposed 
repeater  design  is  well  within  limits  to  ensure  high  over- 
all TRMP  System  availability,  there  exist  several  avenues 
for  enhancement  of  present  or  anticipated  repeater  designs. 
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V.   SUMMARY  AND  CONCLUSIONS 

The  main  purpose  of  this  thesis  was  to  study  the  feasi- 
bility of  enhancing  ring  structured  distributed  computing 
systems  by  initiating  a  high  degree  of  redundancy  in  order 
to  provide  high  availability  necessary  for  insuring  compat- 
ibility with  shipboard  environments.  A  background  study, 
overview  and  functional  description  of  distributed  ring 
software  and  hardware  were  presented  in  Chapter  II.   The 
major  system  constraints  and  environmental  problems  likely 
to  be  encountered  in  a  shipboard  environment  were  described, 
together  with  potential  approaches  for  solving  these  diffi- 
culties, in  the  earlier  sections  of  Chapter  III.   Specific 
designs  utilizing  data  path  redundancy  for  increased  system 
availability  were  discussed  and  calculations  for  each  design 
were  explained  in  the  later  sections  of  Chapter  III. 

One  such  data  path  design,  Triply  Redundant  Multiple 
Path  (TRMP) ,  provided  significantly  improved  system  avail- 
ability even  when  individual  components  of  the  system  were 
exhibiting  unrealistically  low  availabilities.   The  TRMP 
configuration,  when  coupled  with  the  distributed,  asynchronous 
control  philosophy  of  the  distributed  ring  computing  system, 
would  provide  a  highly  reliable  approach  to  network 
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architecture  in  shipboard  platforms  while  providing  the 
full  benefits  of  graceful  degradation  associated  with  the 
distributed  control  concept. 

While  TRMP  appears  feasible,  there  are  still  many 
aspects  of  this  design  that  require  further  study.   It  is 
recommended  that  further  investigation  be  conducted  in 
the  areas  of  system  software  and  communication  protocols 
for  a  TRMP  type  system.   In  addition,  the  development  of 
software  diagnostics  to  aid  in  fault  isolation  is  a  key 
feature  to  the  continued  development  of  this  concept. 
Practical  implementations  of  distributed  ring  computing 
systems  can  be  expanded  to  include  TRMP  data  path  configura- 
tions so  that  actual  system  availability  data  based  on  real 
time  application  will  be  available. 

In  conclusion,  it  is  felt  that  TRMP  provides  the  degree 
of  redundancy  in  ring  structured  systems  to  ensure  high 
availability  in  shipboard  environments.   Even  at  system 
repeater  and  data  path  availabilities  approximating  .8, 
TRMP  delivers  significantly  higher  availabilities  than 
other  configurations  considered.  With  repeater  architecture 
as  simple  in  design  as  it  is,  a  repeater  availability  at 
or  exceeding  .999  is  attainable.   Given  this  level  of  com- 
ponent availability,  TRMP  meets  even  the  highest  standards 
for  overall  data  communication  system  availability. 
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